cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ankit001mittal
by New Contributor III
  • 649 Views
  • 1 replies
  • 0 kudos

DLT Pipeline Stats on Object level

Hi Guys,I want to create a table where I want to store information about each DLT pipelines on object/table id level details about how much time it took for waiting for resources and how much time it took to run for each object and numbers or records...

Data Engineering
dlt
system
  • 649 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 0 kudos

Hi @ankit001mittal DLT Event logs helps you to gather most of the information you've mentioned above. Below is the documentation to the DLT Event Logs:https://docs.databricks.com/aws/en/dlt/observabilityLet me know if any questions.Best,

  • 0 kudos
Ekaterina_Paste
by New Contributor III
  • 19992 Views
  • 12 replies
  • 2 kudos

Resolved! Can't login to databricks community edition

I enter my valid login and password here https://community.cloud.databricks.com/login.html but it says "Invalid email address or password"

  • 19992 Views
  • 12 replies
  • 2 kudos
Latest Reply
Venkat124488
New Contributor II
  • 2 kudos

data bricks cluster is terminating each 15 sec in community edition. Could you please help me on this issue.  

  • 2 kudos
11 More Replies
madrhr
by New Contributor III
  • 4692 Views
  • 4 replies
  • 3 kudos

Resolved! SparkContext lost when running %sh script.py

I need to execute a .py file in Databricks from a notebook (with arguments which for simplicity i exclude here). For this i am using:%sh script.pyscript.py:from pyspark import SparkContext def main(): sc = SparkContext.getOrCreate() print(sc...

Data Engineering
%sh
.py
bash shell
SparkContext
SparkShell
  • 4692 Views
  • 4 replies
  • 3 kudos
Latest Reply
madrhr
New Contributor III
  • 3 kudos

I got it eventually working with a combination of:from databricks.sdk.runtime import *spark.sparkContext.addPyFile("/path/to/your/file")sys.path.append("path/to/your")   

  • 3 kudos
3 More Replies
cookiebaker
by New Contributor III
  • 3438 Views
  • 7 replies
  • 6 kudos

Resolved! Some DLTpipelines suddely seem to take different runtime 16.1 instead of 15.4 since last night (CET)

Hello, Suddenly since last night on some of our DLT pipelines we're getting failures saying that our hive_metastore control table cannot be found. All of our DLT's are set up the same (serverless), and one Shared Compute on runtime version 15.4. For ...

  • 3438 Views
  • 7 replies
  • 6 kudos
Latest Reply
cookiebaker
New Contributor III
  • 6 kudos

@voo-rodrigo Hello, thanks for updating the progress on your end! I've tested as well and confirmed that the DLT can read the hive_metastore via Serverless again. 

  • 6 kudos
6 More Replies
BrendanTierney
by New Contributor II
  • 5916 Views
  • 6 replies
  • 3 kudos

Resolved! Community Edition is not allocating Cluster

I've been trying to use the Community edition for the past 3 days without success.I go to run a Notebook and it begins to allocated the Cluster, but it it never finishes. Sometimes it times out after 15 minutes.Waiting for cluster to start: Finding i...

ezgif.com-gif-maker
  • 5916 Views
  • 6 replies
  • 3 kudos
Latest Reply
JD2001
New Contributor II
  • 3 kudos

I am running into the same issue since today. It worked fine till yesterday.

  • 3 kudos
5 More Replies
ZacayDaushin
by New Contributor
  • 2606 Views
  • 3 replies
  • 0 kudos

How to access system.access.table_lineage

I try to make a select from system.access.table_lineage but i dont have to see the tablewhat permission to i have 

  • 2606 Views
  • 3 replies
  • 0 kudos
Latest Reply
Nivethan_Venkat
Contributor III
  • 0 kudos

Hi @ZacayDaushin,To query the table in system catalog, you need to have SELECT permission on top of the table to query and see the results.Best Regards,Nivethan V

  • 0 kudos
2 More Replies
smpa01
by Contributor
  • 662 Views
  • 1 replies
  • 1 kudos

Resolved! tbl name as paramater marker

I am getting an error here, when I do this//this works fine declare sqlStr = 'select col1 from catalog.schema.tbl LIMIT (?)'; declare arg1 = 500; EXECUTE IMMEDIATE sqlStr USING arg1; //this does not declare sqlStr = 'select col1 from (?) LIMIT (?)';...

  • 662 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

@smpa01 In SQL EXECUTE IMMEDIATE, you can only parameterize values, not identifiers like table names, column names, or database names.That is, placeholders (?) can only replace constant values, not object names (tables, schemas, columns, etc.).SELECT...

  • 1 kudos
BF7
by Contributor
  • 910 Views
  • 2 replies
  • 2 kudos

Resolved! Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

Two Issues:1. What is the behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema? Why would you use both?2. When can cloudFiles.inferColumnTypes be used without a schema checkpoint?  How does that affect the behavior of cloud...

  • 910 Views
  • 2 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema:When cloudFiles.inferColumnTypes is enabled, Auto Loader attempts to identify the appropriate data types for columns instead of defaulting everything to strings, which i...

  • 2 kudos
1 More Replies
p_romm
by New Contributor III
  • 979 Views
  • 4 replies
  • 0 kudos

Structured Streaming writeStream - Query is no longer active causes task to fail

Hi, I execute readStream/writeStream in workflow task. Write stream uses .trigger(availableNow=True) option. After writeStream I'm waiting query to finish with query.awaitTermination(). However from time to time, pipeline ends with "Query <id> is no ...

  • 979 Views
  • 4 replies
  • 0 kudos
Latest Reply
cmathieu
New Contributor III
  • 0 kudos

@Alberto_Umana this bug was apparently fixed a few months ago, but we're still facing the same issue on our end. 

  • 0 kudos
3 More Replies
397973
by New Contributor III
  • 848 Views
  • 1 replies
  • 1 kudos

Resolved! Several unavoidable for loops are slowing this PySpark code. Is it possible to improve it?

Hi. I have a PySpark notebook that takes 25 minutes to run as opposed to one minute in on-prem Linux + Pandas. How can I speed it up?It's not a volume issue. The input is around 30k rows. Output is the same because there's no filtering or aggregation...

  • 848 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

@397973 Spark is optimized for 100s of GB or millions of rows, NOT small in-memory lookups with heavy control flow (unless engineered carefully).That's why Pandas is much faster for your specific case now.Pre-load and Broadcast All MappingsInstead of...

  • 1 kudos
Lo
by New Contributor II
  • 1459 Views
  • 1 replies
  • 0 kudos

SocketTimeoutException when creating execution context in Databricks Community Edition

Hello,I’m experiencing an issue in Databricks Community Edition.When I try to run a notebook, I get this error:"Exception when creating execution context: java.net.SocketTimeoutException: connect Timeout"What I have tried:- Restarting the cluster- Ch...

  • 1459 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @Lo! There is a similar thread where another user encountered the same issue and shared a solution that worked for them. I suggest reviewing that thread to see if the solution is helpful in your case as well.

  • 0 kudos
vidya_kothavale
by Contributor
  • 1031 Views
  • 1 replies
  • 1 kudos

Issue reading Vertica table into Databricks - Numeric value out of range

I am trying to read a Vertica table into a Spark DataFrame using JDBC in Databricks.Here is my sample code:hostname = ""username = ""password = ""database_port = ""database_name = ""qry_col_level = f"""SELECT * FROM analytics_DS.ansh_units_cum_dash""...

  • 1031 Views
  • 1 replies
  • 1 kudos
Latest Reply
Renu_
Valued Contributor II
  • 1 kudos

Hi @vidya_kothavale, based on my research and understanding, Databricks and Spark's JDBC connectors currently don’t offer an automatic way to truncate or round high precision decimal values when loading data. To handle this, you would need to either:...

  • 1 kudos
kweks970
by New Contributor
  • 2640 Views
  • 1 replies
  • 0 kudos

DEV and PROD

"SELECT * FROM' data call on my table in PROD is giving all the rows of data (historical data), but a call on my table in DEV is giving me just one row of data (current one row of historical data). what could be the problem??

  • 2640 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Please don't cross post.  Thanks, Louis.

  • 0 kudos
AlexMc
by New Contributor III
  • 1148 Views
  • 6 replies
  • 1 kudos

Resolved! GET /api/2.2/jobs/list Ordering

Hi there!I am calling the job list API (via the Python SDK):GET /api/2.2/jobs/listdocs.databricks.com/api/workspace/jobs/listDoes anyone know what ordering is applied / calculated for the list of jobs? Is it consistent or random?Is it by creation tim...

  • 1148 Views
  • 6 replies
  • 1 kudos
Latest Reply
AlexMc
New Contributor III
  • 1 kudos

Thanks both - this was very helpful!

  • 1 kudos
5 More Replies
Christian_C
by New Contributor II
  • 1448 Views
  • 7 replies
  • 0 kudos

Google Pub Sub and Delta live table

I am using delta live table and pub sub to ingest message from 30 different topics in parallel. I noticed that initialization time can be very long around 15 minutes. Does someone knows how to reduced initialization time in dlt ? Thanks You 

  • 1448 Views
  • 7 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Classic clusters can take up to seven minutes to be acquired, configured, and deployed, with most of this time spent waiting for the cloud service to allocate virtual machines. In contrast, serverless clusters typically start in under eight seconds. ...

  • 0 kudos
6 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels