cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ChristianRRL
by Valued Contributor
  • 960 Views
  • 5 replies
  • 0 kudos

CREATE view USING json and *include* _metadata, _rescued_data

Title may be self-explanatory. Basically, I'm curious to ask if it's possible (and if so how) to add `_metadata` and `_rescued_data` fields to a view "using json".e.g. %sql CREATE OR REPLACE VIEW entity_view USING json OPTIONS (path="/.../.*json",mu...

ChristianRRL_0-1731949214474.png ChristianRRL_1-1731949348303.png
  • 960 Views
  • 5 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

I am able to perform the below operation for a delta table. SELECT *,_metadata.file_name FROM anytable where condition. https://docs.databricks.com/en/ingestion/file-metadata-column.html You can use something like  df = spark.read \ .format("json")...

  • 0 kudos
4 More Replies
dsnde49
by New Contributor
  • 317 Views
  • 1 replies
  • 0 kudos

Unable to locate saved data

Hi,I have been trying to save some processed data using pandas from my databricks notebook.I have tried two versions, using csv and xlsx. The code for both of them runs without any error, but I'm unable to find the location of the saved data.for tabl...

  • 317 Views
  • 1 replies
  • 0 kudos
Latest Reply
hari-prasad
Valued Contributor II
  • 0 kudos

hi @dsnde49 ,When using Python instead of Spark in Databricks, the data you write will be stored in the drivers local storage.To avoid this, you can utilize the spark-excel jar from Crealytics (Maven Repository: com.crealytics » spark-excel). This to...

  • 0 kudos
VicS
by New Contributor III
  • 665 Views
  • 3 replies
  • 0 kudos

Using Databricks asset bundles with typer instead of argparse

I want to use Databricks asset bundles - I'd like to use `typer` as a CLI tool, but I have only been able to set it up with `argparse`. Argparse seems to be able to retrieve the arguments from the databricks task, but not typer.I specified two entryp...

DdqYAyd4
  • 665 Views
  • 3 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

If switching to parameters does not resolve the issue, you might need to further debug by adding logging statements in your typer entry point to see how the parameters are being received.

  • 0 kudos
2 More Replies
JissMathew
by Contributor III
  • 1910 Views
  • 2 replies
  • 1 kudos

auto loader

source_df = (spark.readStream                 .format("cloudFiles")                 .option("cloudFiles.format", "csv")                 .option("header", "true")                 .option("timestampFormat", "d-M-y H.m")                 .option("cloudFi...

  • 1910 Views
  • 2 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

In case of failures, Auto Loader can resume from where it left off by information stored in the checkpoint location and continue to provide exactly-once guarantees when writing data into Delta Lake. You don’t need to maintain or manage any state your...

  • 1 kudos
1 More Replies
Nis
by New Contributor II
  • 5720 Views
  • 11 replies
  • 5 kudos

Resolved! can we commit offset in spark structured streaming in databricks.

We are storing offset details in checkpoint location wanted to know is there a way can we commit offset once we consume the message from kafka.

  • 5720 Views
  • 11 replies
  • 5 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 5 kudos

@dmytro yes, but this feature is currently in Private Preview. Please submit a support case in https://help.databricks.com/s/ if you have interest in trying out this new feature.

  • 5 kudos
10 More Replies
chaosBEE
by New Contributor II
  • 2042 Views
  • 5 replies
  • 1 kudos

StructField Metadata Dictionary - What are the possible keys?

I have a Delta Live Table which is being deposited to Unity Catalog. In the Python notebook, I am defining the schema with a series of StructFields, for example: StructField(    "columnName",     StringType(),     True,     metadata = {        'comme...

  • 2042 Views
  • 5 replies
  • 1 kudos
Latest Reply
ipreston
New Contributor III
  • 1 kudos

Bump,I've got the same issue. Looks like there was a partial reply from Kaniz but I can't see it in this thread.

  • 1 kudos
4 More Replies
BNV
by New Contributor II
  • 869 Views
  • 10 replies
  • 0 kudos

Translating SQL Value Function For XML To Databricks SQL

Trying to translate this line of a SQL query that evaluates XML to Databricks SQL.SELECT   MyColumn.value('(/XMLData/Values/ValueDefinition[@colID="10"]/@Value)[1]', 'VARCHAR(max)') as Color The XML looks like this:<XMLData><Values><ValueDefinition c...

  • 869 Views
  • 10 replies
  • 0 kudos
Latest Reply
hari-prasad
Valued Contributor II
  • 0 kudos

Yes, now they support XML parse directly in databricks 14.3 or higher, else earlier you could have leveraged spark xml library jars to parse it.You can still leverage xpath in case where one of data column hold XML value in a dataset. As @BNV is look...

  • 0 kudos
9 More Replies
aliacovella
by Contributor
  • 350 Views
  • 1 replies
  • 1 kudos

Resolved! How can I dedupe from a table created from a Kinesis change data capture feed.

Here I have a table named organizations_silver that was build from a bronze table created from a Kinesis change data capture feed.@dlt.table(name="kinesis_raw_stream", table_properties={"pipelines.reset.allowed": "false"})def kinesis_raw_stream(): ...

  • 350 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hello @aliacovella, Looks like there are duplicate records in your source table that match the same target record. This is indeed the case since your source table, organizations_silver, contains duplicates due to the append-only nature of the Kinesis...

  • 1 kudos
krishnachaitany
by New Contributor II
  • 5232 Views
  • 3 replies
  • 4 kudos

Resolved! Spot instance in Azure Databricks

When I run a job enabling using spot instances , I would like to know how many number of workers are using spot and how many number of workers are using on demand instances for a given job run In order to identify the spot instances we got for any...

  • 5232 Views
  • 3 replies
  • 4 kudos
Latest Reply
drumcircle
New Contributor II
  • 4 kudos

This remains a challenge using system tables.

  • 4 kudos
2 More Replies
TimB
by New Contributor III
  • 6742 Views
  • 2 replies
  • 0 kudos

Create external table using multiple paths/locations

I want to create an external table from more than a single path. I have configured my storage creds and added an external location, and I can successfully create a table using the following code;create table test.base.Example using csv options ( h...

  • 6742 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

You do not have to create all the partition folders yourslef. You just need to specify the parent folder like  CREATE OR REPLACE TABLE <catalog>.<schema>.<table-name> USING <format> PARTITIONED BY (<partition-column-list>) LOCATION 's3://<bucket-path...

  • 0 kudos
1 More Replies
CDICSteph
by New Contributor
  • 3290 Views
  • 5 replies
  • 0 kudos

permission denied listing external volume when using vscode databricks extension

hey, i'm using the Db extension for vscode (Databricks connect v2). When using dbutils to list an external volume defined in UC like so:   dbutils.fs.ls("/Volumes/dev/bronze/rawdatafiles/") i get this error: "databricks.sdk.errors.mapping.PermissionD...

  • 3290 Views
  • 5 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

great, thanks for confirming. This feature was under development in the early quarter last year. Now it is available.

  • 0 kudos
4 More Replies
SeliLi_52097
by New Contributor III
  • 4127 Views
  • 5 replies
  • 5 kudos

Databricks Academy webpage showing insecure connection (in Chrome)

When I was trying to visit the Databricks Academy website https://customer-academy.databricks.com, it showed insecure connection as below.This happened at 8 January 2023 (AEDT) around 12:30pm.

Screen Shot 2023-01-08 at 12.15.54 pm
  • 4127 Views
  • 5 replies
  • 5 kudos
Latest Reply
barendlinders
New Contributor II
  • 5 kudos

Certificate has expired again... 

  • 5 kudos
4 More Replies
gfar
by New Contributor II
  • 15256 Views
  • 13 replies
  • 5 kudos

Is it possible to connect QGIS to Databricks using ODBC?

I can connect ArcGIS to Databricks using ODBC, but using the same ODBC DSN for QGIS I get an error - Unable to initialize ODBC connection to DSNHas anyone got this working?

  • 15256 Views
  • 13 replies
  • 5 kudos
Latest Reply
fgoulet
New Contributor III
  • 5 kudos

That should probably help, but I tried and my table has 0 rows when the same table loaded with all the schema analyzed has 835...Still have testing to do, but with that, you can now choose a single file to add using the connection stringODBC:token/yo...

  • 5 kudos
12 More Replies
rgomez
by New Contributor
  • 721 Views
  • 2 replies
  • 1 kudos

Install notebook dependency via terraform for serverless notebook tasks

I am trying to install a wheel file as a dependency for a serverless notebook task via terraform. According to https://docs.databricks.com/en/compute/serverless/dependencies.html , dependencies in serverless notebooks can be configured via the base e...

  • 721 Views
  • 2 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Currently, the databricks_job resource in Terraform does not support configuring the environment for notebook tasks directly. You can upload the YAML file and configure the environment as mentioned in https://docs.databricks.com/en/compute/serverless...

  • 1 kudos
1 More Replies
TjommeV-Vlaio
by New Contributor III
  • 1163 Views
  • 10 replies
  • 0 kudos

Which process is eating up my driver memory?

Hi,We're running DBR 14.3 on a shared multi-node cluster.When checking the metrics of the driver, I see that the Memory utilization and Memory swap utilization are increasing a lot and are almost never decreasing. Even if no processes are running any...

  • 1163 Views
  • 10 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

On OS level you will not see notebooks, you will see the mem consumption of the spark application (so this is all notebooks).For that there is the spark ui.I'd look for collect(), broadcast() statements. Python code outside of spark, tons of graphics...

  • 0 kudos
9 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels