cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

p_romm
by New Contributor III
  • 1147 Views
  • 4 replies
  • 0 kudos

Structured Streaming writeStream - Query is no longer active causes task to fail

Hi, I execute readStream/writeStream in workflow task. Write stream uses .trigger(availableNow=True) option. After writeStream I'm waiting query to finish with query.awaitTermination(). However from time to time, pipeline ends with "Query <id> is no ...

  • 1147 Views
  • 4 replies
  • 0 kudos
Latest Reply
cmathieu
New Contributor III
  • 0 kudos

@Alberto_Umana this bug was apparently fixed a few months ago, but we're still facing the same issue on our end. 

  • 0 kudos
3 More Replies
397973
by New Contributor III
  • 1110 Views
  • 1 replies
  • 1 kudos

Resolved! Several unavoidable for loops are slowing this PySpark code. Is it possible to improve it?

Hi. I have a PySpark notebook that takes 25 minutes to run as opposed to one minute in on-prem Linux + Pandas. How can I speed it up?It's not a volume issue. The input is around 30k rows. Output is the same because there's no filtering or aggregation...

  • 1110 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

@397973 Spark is optimized for 100s of GB or millions of rows, NOT small in-memory lookups with heavy control flow (unless engineered carefully).That's why Pandas is much faster for your specific case now.Pre-load and Broadcast All MappingsInstead of...

  • 1 kudos
Lo
by New Contributor II
  • 1569 Views
  • 1 replies
  • 0 kudos

SocketTimeoutException when creating execution context in Databricks Community Edition

Hello,I’m experiencing an issue in Databricks Community Edition.When I try to run a notebook, I get this error:"Exception when creating execution context: java.net.SocketTimeoutException: connect Timeout"What I have tried:- Restarting the cluster- Ch...

  • 1569 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @Lo! There is a similar thread where another user encountered the same issue and shared a solution that worked for them. I suggest reviewing that thread to see if the solution is helpful in your case as well.

  • 0 kudos
vidya_kothavale
by Contributor
  • 1263 Views
  • 1 replies
  • 1 kudos

Issue reading Vertica table into Databricks - Numeric value out of range

I am trying to read a Vertica table into a Spark DataFrame using JDBC in Databricks.Here is my sample code:hostname = ""username = ""password = ""database_port = ""database_name = ""qry_col_level = f"""SELECT * FROM analytics_DS.ansh_units_cum_dash""...

  • 1263 Views
  • 1 replies
  • 1 kudos
Latest Reply
Renu_
Valued Contributor II
  • 1 kudos

Hi @vidya_kothavale, based on my research and understanding, Databricks and Spark's JDBC connectors currently don’t offer an automatic way to truncate or round high precision decimal values when loading data. To handle this, you would need to either:...

  • 1 kudos
kweks970
by New Contributor
  • 2701 Views
  • 1 replies
  • 0 kudos

DEV and PROD

"SELECT * FROM' data call on my table in PROD is giving all the rows of data (historical data), but a call on my table in DEV is giving me just one row of data (current one row of historical data). what could be the problem??

  • 2701 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Please don't cross post.  Thanks, Louis.

  • 0 kudos
AlexMc
by New Contributor III
  • 1366 Views
  • 6 replies
  • 1 kudos

Resolved! GET /api/2.2/jobs/list Ordering

Hi there!I am calling the job list API (via the Python SDK):GET /api/2.2/jobs/listdocs.databricks.com/api/workspace/jobs/listDoes anyone know what ordering is applied / calculated for the list of jobs? Is it consistent or random?Is it by creation tim...

  • 1366 Views
  • 6 replies
  • 1 kudos
Latest Reply
AlexMc
New Contributor III
  • 1 kudos

Thanks both - this was very helpful!

  • 1 kudos
5 More Replies
Christian_C
by New Contributor II
  • 1798 Views
  • 7 replies
  • 0 kudos

Google Pub Sub and Delta live table

I am using delta live table and pub sub to ingest message from 30 different topics in parallel. I noticed that initialization time can be very long around 15 minutes. Does someone knows how to reduced initialization time in dlt ? Thanks You 

  • 1798 Views
  • 7 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Classic clusters can take up to seven minutes to be acquired, configured, and deployed, with most of this time spent waiting for the cloud service to allocate virtual machines. In contrast, serverless clusters typically start in under eight seconds. ...

  • 0 kudos
6 More Replies
BF7
by Contributor
  • 1240 Views
  • 3 replies
  • 2 kudos

Resolved! How can we get AutoLoader to detect a file footer?

We are dealing with CSVs that have footers in them. When we have an empty file, the presence of this footer seems to impair the schema inferencing of AutoLoader, because of the footer.I know where is a header = true parameter, but I don't see a foote...

  • 1240 Views
  • 3 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

To be clear, when you say footer are you referring to the last row of the tuple?  e.g. Header = row 1, Footer = row_last.  

  • 2 kudos
2 More Replies
Yuki
by Contributor
  • 948 Views
  • 2 replies
  • 1 kudos

Resolved! Can we implement Unity Catalog table lifecycle?

I want to delete tables that haven't been selected or otherwise accessed for several months.I can see the Delta table history, but I can only catch the DDL or update/insert/delete and can't catch "select".I realized that the Unity Catalog insight, ht...

  • 948 Views
  • 2 replies
  • 1 kudos
Latest Reply
Yuki
Contributor
  • 1 kudos

Hi @Renu_ ,I appreciate for your clear response. I now have a better understanding and will work with our admin team to develop a strategy.Thank you.

  • 1 kudos
1 More Replies
Bart_DE
by New Contributor II
  • 1307 Views
  • 2 replies
  • 0 kudos

Resolved! Concurency behavior with merge operations

Hi community,I have this case right now in project where i have to develop a solution that will prevent duplicate data from being ingested twice to delta lake. Some of our data suppliers at a rare occurence are sending us the same dataset in two diff...

  • 1307 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Your idea of using a log table to track processed ingestions and leveraging a MERGE operation in your pipeline is a sound approach for preventing duplicate data ingestion into Delta Lake. Delta Lake's ACID transactions and its support for concurrency...

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 2972 Views
  • 2 replies
  • 0 kudos

DBFS Permissions

if there is permission control on the folder/file level in DBFS.e.g. if a team member uploads a file to /Filestore/Tables/TestData/testfile, could we mask permissions on TestData and/or testfile?

  • 2972 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

DBFS does not have ACL at this point

  • 0 kudos
1 More Replies
sahil3
by New Contributor
  • 495 Views
  • 1 replies
  • 0 kudos

NOT ABLE TO ATTACH CLUSTRE

notebook detached-exception when creating execution context:java.until.concurrent.timeoutexceoption:timed out after 15 seconds

  • 495 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 0 kudos

Hey @sahil3 Try detach and re-attaching the notebook to the notebook. Please note that this will clear the state of the notebook.if the issue persists, try restarting the cluster.Best,

  • 0 kudos
rak_haq
by New Contributor III
  • 1371 Views
  • 3 replies
  • 1 kudos

Resolved! How to use read_kafka() SQL with secret()?

Hi,I want to read data from the Azure Event Hub using SQL.Can someone please give me an executable example where you can also use the connection string from the event hub using the SQL function secret(), for example?This is what i tried but it Databr...

Data Engineering
azure
event_hub
kafka
sql
streaming
  • 1371 Views
  • 3 replies
  • 1 kudos
Latest Reply
rak_haq
New Contributor III
  • 1 kudos

I found the solution und could successfully establish a connection to Event-Hub.  SELECT cast(value as STRING) as raw_json, current_timestamp() as processing_time FROM read_kafka( bootstrapServers => '<YOUR EVENT-HUB NAMESPACE>.servicebus.windows.n...

  • 1 kudos
2 More Replies
Ajay-Pandey
by Databricks MVP
  • 4770 Views
  • 5 replies
  • 0 kudos

On-behalf-of token creation for service principals is not enabled for this workspace

Hi AllI just wanted to create PAT for Databricks Service Principle but getting below code while hitting API or using CLI - Please help me to create PAT for the same.#dataengineering #databricks

AjayPandey_0-1710845262519.png AjayPandey_1-1710845276557.png
Data Engineering
community
Databricks
  • 4770 Views
  • 5 replies
  • 0 kudos
Latest Reply
JackB
New Contributor II
  • 0 kudos

You can generate the token while logged in as the Service Principle via the Azure CLI in a Command Prompt window.  To do so, make sure to install the Azure CLI and the Databricks CLI with it.Install the Azure CLI for Windows | Microsoft LearnInstall ...

  • 0 kudos
4 More Replies
Harrison
by New Contributor II
  • 2233 Views
  • 1 replies
  • 0 kudos

Reading CloudWatch Logs from AWS Kinesis

If you have AWS CloudWatch subscribed to write out logs to AWS Kinesis, the Kinesis stream is base64 encoded and the CloudWatch logs are GZIP compressed. The challenge we faced was how to address that in pyspark to be able to read the data.  We were ...

  • 2233 Views
  • 1 replies
  • 0 kudos
Latest Reply
oblikas
New Contributor II
  • 0 kudos

Thank you so much, this is very helpful

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels