cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

aahil824
by Visitor
  • 44 Views
  • 2 replies
  • 0 kudos

How to read zip folder that contains 4 .csv files

Hello Community, I have uploaded one zip folder  "dbfs:/FileStore/tables/bike_sharing.zip"  I was trying to unzip the folder and read the 4 .csv files. I was unable to do it. Any help from your side will really be grateful!

  • 44 Views
  • 2 replies
  • 0 kudos
Latest Reply
Yugeshg
New Contributor II
  • 0 kudos

Hi,What is the error message?Is your Container is mounted or Unmounted Container?.

  • 0 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 7983 Views
  • 2 replies
  • 0 kudos
  • 7983 Views
  • 2 replies
  • 0 kudos
Latest Reply
lchari
Visitor
  • 0 kudos

Is the limit per "table/dataframe" or for all tables/dataframes put together?The driver collects the data from all executors (which are having the respective table or dataframe) and distributes to all executors. When will the memory be released in bo...

  • 0 kudos
1 More Replies
ChristianRRL
by Valued Contributor
  • 18 Views
  • 0 replies
  • 0 kudos

PKEY Upserting Pattern With Older Runtimes

Hi there,I'm aware that nowadays newer runtimes of Databricks support some great features, including primary and foreign key constraints. I'm wondering, if we have clusters that are running older runtime versions, are there Upserting patterns that ha...

  • 18 Views
  • 0 replies
  • 0 kudos
ChristianRRL
by Valued Contributor
  • 87 Views
  • 4 replies
  • 4 kudos

Resolved! Spark SQL: USING JSON to create VIEWS/TABLES with existing schema file

Hi there,I'm trying to understand if there's an easy way to create VIEWS and TABLES (I'm interested in both) *WITH* a provided schema file. For example, I understand that via dataframes I can accomplish this via something like this:df = spark.read.sc...

  • 87 Views
  • 4 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 4 kudos

Hi @ChristianRRL ,(A) You're not missing anything, there's no such an option as of today for SQL API.  (B) It would be much better for you to just use pyspark, but if you have to stick to just SQL API you can use following aproach. Define your schema...

  • 4 kudos
3 More Replies
JissMathew
by New Contributor
  • 91 Views
  • 3 replies
  • 0 kudos

Reading a csv file

while try to read a csv file using data frame , read csv using a  file format , but fail in case of formatting and column error while loading how the data in databricks ,the code i used fordf = spark.read.format("csv") \    .option("header", "true") ...

Screenshot 2024-11-14 172800.png Screenshot 2024-11-14 172727.png
  • 91 Views
  • 3 replies
  • 0 kudos
Latest Reply
JissMathew
New Contributor
  • 0 kudos

@MuthuLakshmi  actually, In "adreess" column  we need  "kochi", and column miss match and get into "name" column , that is the error  

  • 0 kudos
2 More Replies
SagarJi
by New Contributor II
  • 27 Views
  • 1 replies
  • 0 kudos

Is there any impact on data ingestion and data extract while REORG TABLE is in progress

While using deltalake for eventing system, with repeated updates and merges etc, we are using deletion vector to improve performance. With that comes "REORG TABLE" maintenance task.My question is in a ingestion and extract heavy system, when we condu...

  • 27 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

It is advisable to schedule REORG TABLE operations during periods of low activity to minimize disruptions to both data ingestion and extraction processes.This can potentially affect ongoing data ingestion processes because the table's underlying file...

  • 0 kudos
guangyi
by Contributor III
  • 58 Views
  • 1 replies
  • 0 kudos

What is the correct way to measure the performance of a Databrick notebook?

Here is my code for converting one column field of a data frame to time data type:  col_value = df.select(df.columns[0]).first()[0] start_time = time.time() col_value = datetime.strftime(col_value, "%Y-%m-%d %H:%M:%S") \ if isinstance(co...

  • 58 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The behavior you're observing is likely due to a combination of factors related to how Python executes code and how time is measured. Let's break down the issues and provide some recommendations for more accurate timing: Resolution of time.time():The...

  • 0 kudos
Phani1
by Valued Contributor II
  • 32 Views
  • 1 replies
  • 0 kudos

Cluster idle time and usage details

How can we find out the usage details of the Databricks cluster? Specifically, we need to know how many nodes are in use, how long the cluster is idle, the time it takes to start up, and the jobs it is running along with their durations. Is there a q...

  • 32 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

You can try the following query on your system tables: WITH cluster_usage AS ( SELECT u.usage_metadata.cluster_id, u.usage_date, u.usage_start_time, u.usage_end_time, DATEDIFF(second, u.usage_start_time, u.usage_end_time) AS du...

  • 0 kudos
cristianc
by Contributor
  • 32 Views
  • 1 replies
  • 1 kudos

Resolved! Does Databricks support AWS S3 Express One Zone?

Greetings,I'm writing this message since I learned that AWS has a storage class that is faster than S3 Standard called "S3 Express One Zone". (https://aws.amazon.com/s3/storage-classes/express-one-zone/)AWS offers support for this storage class with ...

  • 32 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Right now there is no support for S3 Express One Zone but this is already in our radar through idea DB-I-8058, this is currently tagged as Considered for the future, there is no ETA but our teams are working to have this supported in the near future.

  • 1 kudos
TX-Aggie-00
by New Contributor
  • 100 Views
  • 4 replies
  • 0 kudos

Installing linux packages on cluster

Hey everyone!  We have a need to utilize libreoffice in one of our automated tasks via a notebook.  I have tried to install via a init script that I attach to the cluster, but sometimes the program gets installed and sometimes it doesn't.  For obviou...

  • 100 Views
  • 4 replies
  • 0 kudos
Latest Reply
TX-Aggie-00
New Contributor
  • 0 kudos

Thanks Alberto!  There were 42 deb files, so I just changed my script to:sudo dpkg -i /dbfs/Volumes/your_catalog/your_schema/your_volume/*.debThe init_script log shows that it unpacks everything, sets them up and the processes triggers, but the packa...

  • 0 kudos
3 More Replies
dyusuf
by New Contributor
  • 34 Views
  • 1 replies
  • 0 kudos

Unable to install kafka in community edition

Hi,I am trying to install kafka in databricks community edition after downloading. Using below command in notebook.. %sh cd kafka_2.12-3.8.1/ls -ltr ./bin/zookeeper-server-start.sh config/zookeeper.properties Below is the error log.Kindly help.

  • 34 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

It seems like the issue might be related to a port conflict, as indicated by the java.net.BindException: Address already in use error. You might want to check if another instance of ZooKeeper or another service is using the same port

  • 0 kudos
Dean_Lovelace
by New Contributor III
  • 17750 Views
  • 11 replies
  • 2 kudos

How can I deploy workflow jobs to another databricks workspace?

I have created a number of workflows in the Databricks UI. I now need to deploy them to a different workspace.How can I do that?Code can be deployed via Git, but the job definitions are stored in the workspace only.

  • 17750 Views
  • 11 replies
  • 2 kudos
Latest Reply
cpradeep
New Contributor II
  • 2 kudos

@Dean_Lovelace did you implement the solution ? Please share how you implemented CI/CD for workflow?

  • 2 kudos
10 More Replies
ChsAIkrishna
by New Contributor II
  • 61 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks Workflow dbt-core job failure with Connection aborted

When we are using dbt-core task on databricks workflow, each 100 workflow executions one job is failing with below reason after the reboot it works well what would be the permanent remediation ? ('Connection aborted.', RemoteDisconnected('Remote end ...

  • 61 Views
  • 2 replies
  • 1 kudos
Latest Reply
ChsAIkrishna
New Contributor II
  • 1 kudos

@Walter_C  Kudo's to you, Thank you very much, we placed the "connect retries" lets see. Ref : https://docs.getdbt.com/docs/core/connect-data-platform/databricks-setup#additional-parameters   

  • 1 kudos
1 More Replies
Sans
by New Contributor III
  • 2514 Views
  • 8 replies
  • 3 kudos

Unable to create new compute in community databricks

Hi Team,I am unable to create computer in databricks community due to below error. Please advice.Bootstrap Timeout:Node daemon ping timeout in 780000 ms for instance i-0ab6798b2c762fb25 @ 10.172.246.217. Please check network connectivity between the ...

  • 2514 Views
  • 8 replies
  • 3 kudos
Latest Reply
Rakeshkikani
New Contributor
  • 3 kudos

Same issue facing by myself as well

  • 3 kudos
7 More Replies
sumitdesai
by New Contributor II
  • 4163 Views
  • 3 replies
  • 3 kudos

How to reuse a cluster with Databricks Asset bundles

I am using Databricks asset bundles as an IAC tool with databricks. I want to create a cluster using DAB and then reuse the same cluster in multiple jobs. I can not find an example for this. Whatever examples I found out have all specified individual...

  • 4163 Views
  • 3 replies
  • 3 kudos
Latest Reply
felix_
New Contributor II
  • 3 kudos

Hi, would it also be possible to reuse the same job cluster for multiple "Run Job" Tasks?

  • 3 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels