cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

databicky
by Contributor II
  • 6431 Views
  • 6 replies
  • 1 kudos

Resolved! how to check dataframe column value

in my dataframe it have one column name like count, if that particular column value is greater than zero, the job needs to get failed, how can i perform that one?​

  • 6431 Views
  • 6 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Code without collect, which should not be used in production:if df.filter("count > 0").count() > 0: dbutils.notebook.exit('Notebook Failed')you can also use a more aggressive version:if df.filter("count > 0").count() > 0: raise Exception("count bigge...

  • 1 kudos
5 More Replies
151640
by New Contributor III
  • 3500 Views
  • 4 replies
  • 3 kudos

Resolved! Is there a known issue regarding Databricks JDBC driver character values such as Japanese etc?

A Parquet file contains character data for various languages and is shown by the Data Explorer UX. A simple "select *" query using the Databricks JDBC driver (version 2.6.29) with a tool such as SQLSquirrel displays invalid characters.

image
  • 3500 Views
  • 4 replies
  • 3 kudos
Latest Reply
151640
New Contributor III
  • 3 kudos

The issue encountered has been confirmed to be a defect in the Databricks JDBC driver.

  • 3 kudos
3 More Replies
JD410993
by New Contributor II
  • 2603 Views
  • 3 replies
  • 2 kudos

Job runs indefinitely after integrating with PyDeequ

I'm using PyDeequ data quality checks in one of our jobs. After adding this check, I noticed that the job does not complete and keeps running indefinitely after PyDeequ checks are completed and results are returned.As stated in Pydeequ documentation ...

  • 2603 Views
  • 3 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

Hm, deequ certainly works as I have read about multiple people using it.And when reading the issues (open/closed) on the github pages of pydeequ, databricks is mentioned in some issues so it might be possible after all.But I think you need to check y...

  • 2 kudos
2 More Replies
KVNARK
by Honored Contributor II
  • 3781 Views
  • 4 replies
  • 6 kudos

Resolved! How to parameterize key of spark config in the job clusterlinked service from ADF

how can we parameterize key of the spark-config in the job cluster linked service from Azure datafactory, we can parameterize the values but any idea how can we parameterize the key so that when deploying to further environment it takes the PROD/QA v...

  • 3781 Views
  • 4 replies
  • 6 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 6 kudos

@KVNARK .​ You can use Databricks Secrets (create a Secret scope from AKV https://learn.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes) and then reference a secret in spark configuration (https://learn.microsoft.com/en-us/azure/d...

  • 6 kudos
3 More Replies
Orianh
by Valued Contributor II
  • 4358 Views
  • 2 replies
  • 1 kudos

Resolved! Attach instance profile to service principal.

Hey Guys, I'm having some permission issues using service principal and instance profile and i hope you could help me.I created a service principal and attached to it an instance profile - databricks-my-profile.I have a s3 bucket with policy that all...

  • 4358 Views
  • 2 replies
  • 1 kudos
Latest Reply
Orianh
Valued Contributor II
  • 1 kudos

Hey @Kaniz Fatma​ , @Debayan Mukherjee​, Thanks for your answers.Actually, Databricks is not support using DBFS API with service principal & attached instance profile on a mounted s3 bucket.I'm not sure if this exists in docs (might miss it) but thi...

  • 1 kudos
1 More Replies
chanansh
by Contributor
  • 8070 Views
  • 2 replies
  • 0 kudos

Relative path in absolute URI when reading a folder with files containing ":" colons in filename

I am trying to read a folder with partition files where each partition is date/hour/timestamp.csv where timestamp is the exact timestamp in ISO format, e.g. 09-2022-12-05T20:35:15.2786966Z It seems like spark having issues with reading files with col...

  • 8070 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

The issue was reopened again https://issues.apache.org/jira/browse/HDFS-14762

  • 0 kudos
1 More Replies
tariq
by New Contributor III
  • 10714 Views
  • 5 replies
  • 7 kudos

Databricks Azure Blob Storage access

I am trying to access files stored in Azure blob storage and have followed the documentation linked below:https://docs.databricks.com/external-data/azure-storage.htmlI was successful in mounting the Azure blob storage on dbfs but it seems that the me...

  • 10714 Views
  • 5 replies
  • 7 kudos
Latest Reply
Debayan
Databricks Employee
  • 7 kudos

Hi, @Ravindra Ch​ , could you please check the firewall settings in Azure networking?

  • 7 kudos
4 More Replies
wim_schmitz_per
by New Contributor II
  • 4392 Views
  • 2 replies
  • 2 kudos

Transforming/Saving Python Class Instances to Delta Rows

I'm trying to reuse a Python Package to do a very complex series of parsing binary files into workable data in Delta Format. I have made the first part (binary file parsing) work with a UDF:asffileparser = F.udf(File()._parseBytes,AsfFileDelta.getSch...

  • 4392 Views
  • 2 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, did you try to follow, "Fix it by registering a custom IObjectConstructor for this class."?Also, could you please provide us the full error?

  • 2 kudos
1 More Replies
ramravi
by Contributor II
  • 2634 Views
  • 1 replies
  • 0 kudos

Unable to connect to databricks cluster from Windows using databricks-connect

I am trying to setup databricks-connect in my windows machine. While doing databricks-connect test I am getting the below error complaining java certificate is not found. ''Caused by: sun.security.validator.ValidatorException: PKIX path building fail...

cer
  • 2634 Views
  • 1 replies
  • 0 kudos
Latest Reply
ramravi
Contributor II
  • 0 kudos

Adding the certificate from the root level worked for me. This problem is solved.

  • 0 kudos
dotan
by New Contributor II
  • 2378 Views
  • 3 replies
  • 2 kudos

Poor Auto Loader performance with CSV files on S3

I setup a notebook to ingest data using Auto Loader from an S3 bucket that contains over 500K CSV files into a hive table.Recently the amount of rows (and input files) in the table grew from around 150M to 530M and now each batch takes around an hour...

  • 2378 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Dotan Schachter​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 2 kudos
2 More Replies
SQL_DB
by New Contributor II
  • 2474 Views
  • 1 replies
  • 1 kudos

Sharing CSV export from a dashboard

Is it possible to schedule refresh and share a csv format of a table visual in a dashboard? Also, is it possible to share only one visual in a dashboard when there are more than one?

  • 2474 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Sujitha Bommayan​ Hope everything is going great.Does @Kaniz Fatma​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
Abhijeet
by New Contributor III
  • 4243 Views
  • 4 replies
  • 5 kudos

How to Read Terabytes of data in Databricks

I want to read 1000 GB data. As in spark we do in memory transformation. Do I need worker nodes with combined size of 1000 GB.Also Just want to understand if will reading we store 1000 GB in memory. So how the Cache Data frame is different from the a...

  • 4243 Views
  • 4 replies
  • 5 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 5 kudos

Hi @Abhijeet Singh​ below blog might help you-Link

  • 5 kudos
3 More Replies
Ulf
by New Contributor II
  • 1333 Views
  • 1 replies
  • 0 kudos

Github and task integration

I have the same problem as described in this post (https://community.databricks.com/s/question/0D58Y00009ObQgdSAF/running-jobs-using-notebooks-in-a-remote-azure-devops-services-repos-git-repository-is-generating-notebook-not-found-error) and get this...

  • 1333 Views
  • 1 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi,Could you please check and let us know if this helps. https://community.databricks.com/s/question/0D53f00001GHVTNCA5/notebook-path-cant-be-in-dbfs

  • 0 kudos
Etyr
by Contributor
  • 7388 Views
  • 4 replies
  • 4 kudos

Resolved! Generate longer token for Databricks with Azure.

I'm using DefaultAzureCredential from azure-identity to connect to Azure with service principal environment variables (AZURE_CLIENT_SECRET, AZURE_TENANT_ID, AZURE_CLIENT_ID).I can get_token from a specific scope for databricks like this:from azure.id...

  • 7388 Views
  • 4 replies
  • 4 kudos
Latest Reply
Etyr
Contributor
  • 4 kudos

I made up an alternative solution. I made up my own python class to handle my PAT from Databricks : https://stackoverflow.com/questions/75071869/python-defaultazurecredential-get-token-set-expiration-or-renew-token/You can be fancier or even register...

  • 4 kudos
3 More Replies
Etyr
by Contributor
  • 9528 Views
  • 3 replies
  • 2 kudos

Resolved! slow Fetching results by client in databricks SQL calling from Azure Compute Instance (AML)

I'm using `databricks-sql-connector` in python3.8 to connect to an Azure SQL Wharehouse inside an Azure Machine Learning Compute Instance.I have this large result query, looking at the `query history` I check the time spent on doing the query, and se...

first_time_query
  • 9528 Views
  • 3 replies
  • 2 kudos
Latest Reply
Etyr
Contributor
  • 2 kudos

So I made some few tests. Since you said that the Databricks SQL driver wasn't made to retrieve that amount of data. I went on Spark.I fired up a small spark cluster, the query was as fast as on SQL Warehouse, then I did a df.write.parquet("/my_path/...

  • 2 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels