cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

151640
by New Contributor III
  • 4439 Views
  • 4 replies
  • 3 kudos

Resolved! Is there a known issue regarding Databricks JDBC driver character values such as Japanese etc?

A Parquet file contains character data for various languages and is shown by the Data Explorer UX. A simple "select *" query using the Databricks JDBC driver (version 2.6.29) with a tool such as SQLSquirrel displays invalid characters.

image
  • 4439 Views
  • 4 replies
  • 3 kudos
Latest Reply
151640
New Contributor III
  • 3 kudos

The issue encountered has been confirmed to be a defect in the Databricks JDBC driver.

  • 3 kudos
3 More Replies
JD410993
by New Contributor II
  • 3383 Views
  • 3 replies
  • 2 kudos

Job runs indefinitely after integrating with PyDeequ

I'm using PyDeequ data quality checks in one of our jobs. After adding this check, I noticed that the job does not complete and keeps running indefinitely after PyDeequ checks are completed and results are returned.As stated in Pydeequ documentation ...

  • 3383 Views
  • 3 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

Hm, deequ certainly works as I have read about multiple people using it.And when reading the issues (open/closed) on the github pages of pydeequ, databricks is mentioned in some issues so it might be possible after all.But I think you need to check y...

  • 2 kudos
2 More Replies
KVNARK
by Honored Contributor II
  • 4578 Views
  • 4 replies
  • 6 kudos

Resolved! How to parameterize key of spark config in the job clusterlinked service from ADF

how can we parameterize key of the spark-config in the job cluster linked service from Azure datafactory, we can parameterize the values but any idea how can we parameterize the key so that when deploying to further environment it takes the PROD/QA v...

  • 4578 Views
  • 4 replies
  • 6 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 6 kudos

@KVNARK .​ You can use Databricks Secrets (create a Secret scope from AKV https://learn.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes) and then reference a secret in spark configuration (https://learn.microsoft.com/en-us/azure/d...

  • 6 kudos
3 More Replies
Orianh
by Valued Contributor II
  • 5348 Views
  • 2 replies
  • 1 kudos

Resolved! Attach instance profile to service principal.

Hey Guys, I'm having some permission issues using service principal and instance profile and i hope you could help me.I created a service principal and attached to it an instance profile - databricks-my-profile.I have a s3 bucket with policy that all...

  • 5348 Views
  • 2 replies
  • 1 kudos
Latest Reply
Orianh
Valued Contributor II
  • 1 kudos

Hey @Kaniz Fatma​ , @Debayan Mukherjee​, Thanks for your answers.Actually, Databricks is not support using DBFS API with service principal & attached instance profile on a mounted s3 bucket.I'm not sure if this exists in docs (might miss it) but thi...

  • 1 kudos
1 More Replies
chanansh
by Contributor
  • 9491 Views
  • 2 replies
  • 0 kudos

Relative path in absolute URI when reading a folder with files containing ":" colons in filename

I am trying to read a folder with partition files where each partition is date/hour/timestamp.csv where timestamp is the exact timestamp in ISO format, e.g. 09-2022-12-05T20:35:15.2786966Z It seems like spark having issues with reading files with col...

  • 9491 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

The issue was reopened again https://issues.apache.org/jira/browse/HDFS-14762

  • 0 kudos
1 More Replies
tariq
by New Contributor III
  • 13150 Views
  • 5 replies
  • 7 kudos

Databricks Azure Blob Storage access

I am trying to access files stored in Azure blob storage and have followed the documentation linked below:https://docs.databricks.com/external-data/azure-storage.htmlI was successful in mounting the Azure blob storage on dbfs but it seems that the me...

  • 13150 Views
  • 5 replies
  • 7 kudos
Latest Reply
Debayan
Databricks Employee
  • 7 kudos

Hi, @Ravindra Ch​ , could you please check the firewall settings in Azure networking?

  • 7 kudos
4 More Replies
wim_schmitz_per
by New Contributor II
  • 5515 Views
  • 2 replies
  • 2 kudos

Transforming/Saving Python Class Instances to Delta Rows

I'm trying to reuse a Python Package to do a very complex series of parsing binary files into workable data in Delta Format. I have made the first part (binary file parsing) work with a UDF:asffileparser = F.udf(File()._parseBytes,AsfFileDelta.getSch...

  • 5515 Views
  • 2 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, did you try to follow, "Fix it by registering a custom IObjectConstructor for this class."?Also, could you please provide us the full error?

  • 2 kudos
1 More Replies
ramravi
by Contributor II
  • 3280 Views
  • 1 replies
  • 0 kudos

Unable to connect to databricks cluster from Windows using databricks-connect

I am trying to setup databricks-connect in my windows machine. While doing databricks-connect test I am getting the below error complaining java certificate is not found. ''Caused by: sun.security.validator.ValidatorException: PKIX path building fail...

cer
  • 3280 Views
  • 1 replies
  • 0 kudos
Latest Reply
ramravi
Contributor II
  • 0 kudos

Adding the certificate from the root level worked for me. This problem is solved.

  • 0 kudos
dotan
by New Contributor II
  • 3541 Views
  • 3 replies
  • 2 kudos

Poor Auto Loader performance with CSV files on S3

I setup a notebook to ingest data using Auto Loader from an S3 bucket that contains over 500K CSV files into a hive table.Recently the amount of rows (and input files) in the table grew from around 150M to 530M and now each batch takes around an hour...

  • 3541 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Dotan Schachter​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 2 kudos
2 More Replies
SQL_DB
by New Contributor II
  • 2987 Views
  • 1 replies
  • 1 kudos

Sharing CSV export from a dashboard

Is it possible to schedule refresh and share a csv format of a table visual in a dashboard? Also, is it possible to share only one visual in a dashboard when there are more than one?

  • 2987 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Sujitha Bommayan​ Hope everything is going great.Does @Kaniz Fatma​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
Abhijeet
by New Contributor III
  • 5476 Views
  • 4 replies
  • 5 kudos

How to Read Terabytes of data in Databricks

I want to read 1000 GB data. As in spark we do in memory transformation. Do I need worker nodes with combined size of 1000 GB.Also Just want to understand if will reading we store 1000 GB in memory. So how the Cache Data frame is different from the a...

  • 5476 Views
  • 4 replies
  • 5 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 5 kudos

Hi @Abhijeet Singh​ below blog might help you-Link

  • 5 kudos
3 More Replies
Ulf
by New Contributor II
  • 1689 Views
  • 1 replies
  • 0 kudos

Github and task integration

I have the same problem as described in this post (https://community.databricks.com/s/question/0D58Y00009ObQgdSAF/running-jobs-using-notebooks-in-a-remote-azure-devops-services-repos-git-repository-is-generating-notebook-not-found-error) and get this...

  • 1689 Views
  • 1 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi,Could you please check and let us know if this helps. https://community.databricks.com/s/question/0D53f00001GHVTNCA5/notebook-path-cant-be-in-dbfs

  • 0 kudos
Etyr
by Contributor II
  • 9434 Views
  • 4 replies
  • 4 kudos

Resolved! Generate longer token for Databricks with Azure.

I'm using DefaultAzureCredential from azure-identity to connect to Azure with service principal environment variables (AZURE_CLIENT_SECRET, AZURE_TENANT_ID, AZURE_CLIENT_ID).I can get_token from a specific scope for databricks like this:from azure.id...

  • 9434 Views
  • 4 replies
  • 4 kudos
Latest Reply
Etyr
Contributor II
  • 4 kudos

I made up an alternative solution. I made up my own python class to handle my PAT from Databricks : https://stackoverflow.com/questions/75071869/python-defaultazurecredential-get-token-set-expiration-or-renew-token/You can be fancier or even register...

  • 4 kudos
3 More Replies
Etyr
by Contributor II
  • 12613 Views
  • 3 replies
  • 2 kudos

Resolved! slow Fetching results by client in databricks SQL calling from Azure Compute Instance (AML)

I'm using `databricks-sql-connector` in python3.8 to connect to an Azure SQL Wharehouse inside an Azure Machine Learning Compute Instance.I have this large result query, looking at the `query history` I check the time spent on doing the query, and se...

first_time_query
  • 12613 Views
  • 3 replies
  • 2 kudos
Latest Reply
Etyr
Contributor II
  • 2 kudos

So I made some few tests. Since you said that the Databricks SQL driver wasn't made to retrieve that amount of data. I went on Spark.I fired up a small spark cluster, the query was as fast as on SQL Warehouse, then I did a df.write.parquet("/my_path/...

  • 2 kudos
2 More Replies
Tacuma
by New Contributor II
  • 3121 Views
  • 4 replies
  • 1 kudos

Scheduling jobs with Airflow result in each task running multiple jobs.

Hey everyone, I'm experiementing with running containerized pyspark jobs in Databricks, and orchestrating them with airflow. I am however, encountering an issue here. When I trigger an airflow DAG, and I look at the logs, I see that airflow is spinni...

  • 3121 Views
  • 4 replies
  • 1 kudos
Latest Reply
Tacuma
New Contributor II
  • 1 kudos

Both, I guess? Yes, all jobs share the same config - the question I have is why in the same airflow task log, there are 3 jobs runs. I'm hoping that there's something in the configs and may give me some kind of clue.

  • 1 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels